-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue: 3795997 Allow split segment with unacked q #145
Open
iftahl
wants to merge
169
commits into
Mellanox:vNext
Choose a base branch
from
iftahl:split_w_unacked
base: vNext
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
At most a single element of this vector is always used. Once rfs constructor is complete there must be exactly one attach_flow_data element in case of ring_simple. For ring_tap this element remains null. Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alex Briskin <[email protected]>
Set ETIMEDOUT errno and return -1 from recv in case a socket was timed out, instead of 0 return value and 0 errno. For instance, in case of TCP keep alive timeout. Signed-off-by: Alexander Grissik <[email protected]>
The idea is to scan all rpm/deb packages for personal emails we should not be releasing packages with such emails the scan is done on both the metadat info and the changelog of a specific package Issue: HPCINFRA-919 Signed-off-by: Daniel Pressler <[email protected]>
XLIO Socket API must guarantee that the XLIO_SOCKET_EVENT_TERMINATED is not followed by any other events. Therefore, all the TX completion events must be completed by that moment. Do a polling iteration before calling socket destructor to increase the chance that all the relevant WQEs are completed. This mechanism needs to be improved in the future. Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_init_ex() changes some default parameters. However, a global object can trigger safe_mce_sys() constructor at the start. Therefore, we need to re-read the environment variables again to guarantee that the changed parameters take place. Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid using connect() with sock fd interface, because fd_collection doesn't keep xlio_socket_t objects. Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_socket_t objects aren't connected to the fd_collection anymore. Therefore, all the methods must be called from the sockinfo_tcp objects directly. Also, xlio_socket_fd() is not relevant anymore and can be removed. Signed-off-by: Dmytro Podgornyi <[email protected]>
Iterate over std::list of TCP sockets while erasing socket during iteration. Overcomed by increasing iterator before erase. Signed-off-by: Iftah Levi <[email protected]>
rdma-core limits number of UARs per context to 16 by default. After creating 16 QPs, XLIO receives duplicates of blueflame registers for each subsequent QP. As results, blueflame doorbell method can write WQEs concurrently without serialization and this leads to a data corruption. BlueFlame can make impact on throughput, since copy to the blueflame register is expensive. It can improve latency in some low latency scenarios, however, XLIO targets high traffic/PPS rates. Removing blueflame method also slightly improves performance in some scenarios. BlueFlame can be returned back in the future to improve low-latency scenarios, however, it will need some rework to avoid the data corruption. Signed-off-by: Dmytro Podgornyi <[email protected]>
The inline WQE branch is not likely in most throughput scenarios. Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid calling register_socket_timer_event when a socket is already registered (TIME-WAIT). Although there is no functionality issue with that, it produces too high rate of posting events for internal-thread. This leads to lock contantion inside internal-thread and degraded performance of HTTP CPS. Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
UTLS uses tcp_tx_express() for non blocking sockets. However, this TX method doesn't support XLIO_RX_POLL_ON_TX_TCP. Additional RX polling improves scenarios such as WEB servers. Insert RX polling into UTLS TX path to resolve performance degradation. Signed-off-by: Dmytro Podgornyi <[email protected]>
In heavy CPS scenarios a socket may go to TIME-WAIT state and be reused before first TCP timer registration is performed by internal-thread. 1. Setting timer_registered=true while posting the event prevents the second attemp to try and post the event again. 2. Adding sanity check in add_new_timer that verifies that the socket is not already in the timer map. Signed-off-by: Alexander Grissik <[email protected]>
Added new env parameter - XLIO_MAX_TSO_SIZE. It allows the user to control maximum size of TSO, instead of taking the maximum cap by HW. The default size is 256KB (maximum by current HW). Values higher than HW capabilities won't be taken into account. Signed-off-by: Iftah Levi <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
When sock_stats was static its destructor was called before xlio_exit that destroys the internal-thread which destroys sockets. We should avoid having global objects with untrivial constructors/destructors, since there is no control of their execution order. Signed-off-by: Alexander Grissik <[email protected]>
When TCP socket is destroyed it frees the preallocated buffers after dst_entry is deleted. This returns the buffers to the global pool directly and breaks m_tx_num_bufs,m_zc_num_bufs ring counters. 1. Move the preallocated buffers cleanup before dst_entry destruction. 2. Add ring stats for m_tx_num_bufs and m_zc_num_bufs. Signed-off-by: Alexander Grissik <[email protected]>
1. Removing hardcoded check that switches AIM to latency mode. In case of low packet rate the calculation will result in 0 count anyway. In case packet rate is higher than the desired interrupt rate we do want to utilize the AIM correctly. 2. Changing default AIM values to more reasonable. 3. Removing default values for Nginx and use AIM by default. This improves CPU utilization in low congested cases significantly.: Signed-off-by: Alexander Grissik <[email protected]>
These parameters are deprecated and will be removed in the future. Use XLIO_MEMORY_LIMIT instead. Signed-off-by: Dmytro Podgornyi <[email protected]>
MCE_MAX_CQ_POLL_BATCH usage requires it to be small enough. However, this is a logical upper limit and we want be able to raise it if necessary. Remove unused cq_mgr_tx::clean_cq() which uses MCE_MAX_CQ_POLL_BATCH for an array on stack. Adjust condition for RX buffers compensation to remove MCE_MAX_CQ_POLL_BATCH. However, this changes the logic and now, we forcibly compensate only the last RX buffer in RQ. Signed-off-by: Dmytro Podgornyi <[email protected]>
MCE_MAX_CQ_POLL_BATCH is a logical upper limit for CQ polling batch size. There is no hard limitation for it, so raise it to maximum CQ size. This value can even exceed CQ size, because HW continue receiving packets during polling. Be default, this change doesn't have effect unless a higher value for XLIO_CQ_POLL_BATCH_MAX is set explicitly. This can be helpful in a scenario when a high traffic rate stops for a long time and number of packets in an RQ exceeds the batch size. Signed-off-by: Dmytro Podgornyi <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
When the send window is not big enough for the required TCP segment to send, we may split the segment so it will fit into the window. Before this change - We didn't split the segment in the case we have unacked segments. The motivation was that we anticipate to get ACK on the inflight segments, which will trigger the next send operation. This flow counts on RTT for receiving ACKs, which may be delayed depending on the remote side. When RTT is long - we would block sending although the TCP send window allows it. The change is to split TCP segments although we have unacked data, in case the send window is big enough (mss). Signed-off-by: Iftah Levi <[email protected]>
bot:retest |
@AlexanderGrissik can we merge it? |
AlexanderGrissik
approved these changes
May 29, 2024
@iftahl, please add statistics, @AlexanderGrissik asked. |
AlexanderGrissik
force-pushed
the
vNext
branch
from
August 18, 2024 08:58
cfce2e8
to
51c2340
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When the send window is not big enough for the required TCP segment to send, we may split the segment so it will fit into the window. Before this change - We didn't split the segment in the case we have unacked segments. The motivation was that we anticipate to get ACK on the inflight segments, which will trigger the next send operation.
This flow counts on RTT for receiving ACKs, which may be delayed depending on the remote side. When RTT is long - we would block sending although the TCP send window allows it.
The change is to split TCP segments although we have unacked data, in case the send window is big enough (mss).
Change type
What kind of change does this PR introduce?
Check list